自动驾驶汽车(AV)必须在动态环境中安全有效地操作。为此,配备联合雷达通信(JRC)功能的AVS可以通过使用雷达检测和数据通信功能来增强驾驶安全性。但是,在不确定性和周围环境的动态下,通过两种不同功能优化AV系统的性能非常具有挑战性。在这项工作中,我们首先提出一个基于马尔可夫决策过程(MDP)的智能优化框架,以帮助AV在周围环境的动态和不确定性下选择JRC操作功能时做出最佳决策。然后,我们开发了一种有效的学习算法,利用了深度强化学习技术的最新进展,以找到AV的最佳政策,而无需任何有关周围环境的先前信息。此外,为了使我们提出的框架更加可扩展,我们开发了一种转移学习(TL)机制,该机制使AV能够利用有价值的体验来加速培训过程,以加速培训过程。广泛的模拟表明,与其他常规的深钢筋学习方法相比,提议的可转移深钢筋学习框架可将AV的障碍检测概率降低到67%。
translated by 谷歌翻译
Inverse medium scattering solvers generally reconstruct a single solution without an associated measure of uncertainty. This is true both for the classical iterative solvers and for the emerging deep learning methods. But ill-posedness and noise can make this single estimate inaccurate or misleading. While deep networks such as conditional normalizing flows can be used to sample posteriors in inverse problems, they often yield low-quality samples and uncertainty estimates. In this paper, we propose U-Flow, a Bayesian U-Net based on conditional normalizing flows, which generates high-quality posterior samples and estimates physically-meaningful uncertainty. We show that the proposed model significantly outperforms the recent normalizing flows in terms of posterior sample quality while having comparable performance with the U-Net in point estimation.
translated by 谷歌翻译
Graph Neural Networks (GNNs) had been demonstrated to be inherently susceptible to the problems of over-smoothing and over-squashing. These issues prohibit the ability of GNNs to model complex graph interactions by limiting their effectiveness at taking into account distant information. Our study reveals the key connection between the local graph geometry and the occurrence of both of these issues, thereby providing a unified framework for studying them at a local scale using the Ollivier's Ricci curvature. Based on our theory, a number of principled methods are proposed to alleviate the over-smoothing and over-squashing issues.
translated by 谷歌翻译
Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.
translated by 谷歌翻译
Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Robots have been brought to work close to humans in many scenarios. For coexistence and collaboration, robots should be safe and pleasant for humans to interact with. To this end, the robots could be both physically soft with multimodal sensing/perception, so that the robots could have better awareness of the surrounding environment, as well as to respond properly to humans' action/intention. This paper introduces a novel soft robotic link, named ProTac, that possesses multiple sensing modes: tactile and proximity sensing, based on computer vision and a functional material. These modalities come from a layered structure of a soft transparent silicon skin, a polymer dispersed liquid crystal (PDLC) film, and reflective markers. Here, the PDLC film can switch actively between the opaque and the transparent state, from which the tactile sensing and proximity sensing can be obtained by using cameras solely built inside the ProTac link. In this paper, inference algorithms for tactile proximity perception are introduced. Evaluation results of two sensing modalities demonstrated that, with a simple activation strategy, ProTac link could effectively perceive useful information from both approaching and in-contact obstacles. The proposed sensing device is expected to bring in ultimate solutions for design of robots with softness, whole-body and multimodal sensing, and safety control strategies.
translated by 谷歌翻译
We introduce efficient deep learning-based methods for legal document processing including Legal Document Retrieval and Legal Question Answering tasks in the Automated Legal Question Answering Competition (ALQAC 2022). In this competition, we achieve 1\textsuperscript{st} place in the first task and 3\textsuperscript{rd} place in the second task. Our method is based on the XLM-RoBERTa model that is pre-trained from a large amount of unlabeled corpus before fine-tuning to the specific tasks. The experimental results showed that our method works well in legal retrieval information tasks with limited labeled data. Besides, this method can be applied to other information retrieval tasks in low-resource languages.
translated by 谷歌翻译
成功的人工智能系统通常需要大量标记的数据来从文档图像中提取信息。在本文中,我们研究了改善人工智能系统在理解文档图像中的性能的问题,尤其是在培训数据受到限制的情况下。我们通过使用加强学习提出一种新颖的填充方法来解决问题。我们的方法将信息提取模型视为策略网络,并使用策略梯度培训来更新模型,以最大程度地提高补充传统跨凝结损失的综合奖励功能。我们使用标签和专家反馈在四个数据集上进行的实验表明,我们的填充机制始终提高最先进的信息提取器的性能,尤其是在小型培训数据制度中。
translated by 谷歌翻译
尽管在过去的几年中取得了重大进展,但歧义仍然是面部表情识别(FER)的关键挑战。它可能导致嘈杂和不一致的注释,这阻碍了现实世界中深度学习模型的性能。在本文中,我们提出了一种新的不确定性标签分布学习方法,以提高深层模型的鲁棒性,以防止不确定性和歧义。我们利用价值空间中的邻里信息来适应培训训练样本的情绪分布。我们还考虑提供的标签将其纳入标签分布时的不确定性。我们的方法可以轻松地集成到深层网络中,以获得更多的培训监督并提高识别准确性。在各种嘈杂和模棱两可的环境下,在几个数据集上进行了密集的实验表明,我们的方法取得了竞争成果,并且超出了最新的最新方法。我们的代码和模型可在https://github.com/minhnhatvt/label-distribution-learning-fer-tf上找到。
translated by 谷歌翻译